Lukas Lehner
2024-11-14
Source: illustrations by @allison_horst
You all have used version control previously:
Git is a sophisticated form of version control. Git…
Add a “Usage” and “Contributing” section to your README.md
https://github.com/lukaslehner/Zurich_2024_workflows_workshop/tree/main
https://github.com/lukaslehner/Zurich_2024_workflows_training
Push rejected. This can happen if you have changes on the remote and on your local repo. > - Solution: Pull first. Resolve the conflict. Then try your push again.
fatal: not a git repository. The command cannot be executed because the current directory is not a Git directory. > - Solution: initialize the repo or change directory to the repo
Commit early and often.
Push to your remote on GitHub often (but not as often as you commit).
Establish a naming convention for commits.
Use tags to mark key steps.
Fork and clone from foreign repos (instead of “just cloning”)
Branch of your development version, especially in teams.
Happy Git intro for data science
GitHub Skills replaced GitHub Learning Lab from 1 Sept 2022.
GitHub pro is free for students (Sign up for GitHub Student Developer Pack).
GitHub Teacher Toolbox is free for course instructors.
Code and Data for the Social Sciences: A Practitioner’s Guide by Matthew Gentzkow and Jesse M. Shapiro.
Generally, git operates through a shell. (Later on, we will install a GUI can make life easier.)
A shell (or terminal) is a program on your computer whose job is to run other programs, rather than do calculations itself.
Let’s start open the shell in In RStudio: Tools > Shell.
A note for Windows users: the default Windows shell does not support git commands. However, we can solve this by installing GitBash - a light shell that does support git commands.
Basic shell commands: https://cfss.uchicago.edu/setup/shell/
Git is the command line version control system (VCS) software, which works on your local computer.
GitHub is an internet hosting service for git repositories.
GitHub Desktop is an application that enables you to interact with GitHub using a GUI instead of the command line or a web browser.
Replicability refers to situations in which a researcher obtains new data to reach the same scientific conclusions as a previous study, whereas reproducibility refers to situations in which the original researcher’s software, code, and data are used to regenerate the results.
✅ Replication standards: guidelines, protocols, and software designed to help researchers share, analyze, archive, preserve, distribute, catalog, translate, verify, and replicate scholarly research data and analyses across disciplines. Includes proposals to improve the norms around data sharing and replication in scientific research.
Science is not built upon blind trust, but on verifiability. Science as “organized skepticism” (Merton, 1947). Only when raw data and other research material is shared such organized skepticism can be implemented, and science can self-correct. One aspect of good scientific practice is Open Data.
Reliable infrastructure for storage and publication (e.g., subject-specific repositories, institutional repositories)
Plan S principle: “from 2021, scientific publications that result from research funded by public grants must be published in compliant Open Access journals or platforms.” (Sherpa Romeo database; fairsharing.org)
Tools for Efficient Research Workflows